[REPORT] Multicloud and on-premises data transfers at scale with AWS DataSync #AWSreInvent #STG353
I participated in the Builders' Session for AWS DataSync. In this post, I will briefly introduce this session.
Overview
Join this builders’ session to immerse yourself in the world of multi-cloud and on-premises data transfers. Learn how to configure and perform a data transfer from an on-premises NFS server and a publicly accessible Google Cloud Storage bucket that is hosting a public dataset to Amazon S3. AWS DataSync makes it fast and simple to migrate your data from other clouds or on-premises NFS servers to AWS as part of your business workflow. Walk away with a step-by-step guide on how to scale out DataSync tasks using multiple DataSync agents. You must bring your laptop to participate.
REPORT
Agenda
Single DataSync task and agent
Google Cloud Storage to Amazon S3
On premises to Amazon S3
Multiple agents for a single task
Multiple agents per task
Maximize bandwidth and copy large datasets with multiple tasks
Multiple tasks scale out agents
workshop
The environment was prepared in advance of the workshop by CloudFormation, I started by allowing the HTTP 80 port from MyIP to the DataSync agent security group, which is required for DataSync agent activation.
Activate DataSync agents
DataSync > Agents > Create agent
Two agents were created, but I did not have time to run them using two.
Data transfer to AWS from Google Cloud Storage
In this case, we will transfer data from Google Cloud Storage to Amazon S3. We will use a single DataSync agent to start the DataSync task and observe the task metrics.
Check the Google Cloud Storage bucket
Transfer these files.
Create DataSync task
DataSync > AgenTasksts > Create task
Configure source location
- Source location options: Create a new location
- Location type: Object storage
- Agents: Agent-1
- Server: storage.googleapis.com
- Bucket name: gcp-public-data-arco-era5
- Folder: /co/single-level-reanalysis.zarr/
- Authentication Requires credentials is unchecked
Configure destination location
- Destination location options: Create a new location
- Location type: Amazon S3
- S3 bucket: datasync-s3-workshop
- S3 storage class: Standard
- Folder: gcp-to-s3-with-single-agent/
- IAM role: Click Autogenerate button
Configure settings
- Task Name: gcp-to-s3-with-single-agent
- Verify data: Verify only the data transferred
- Set bandwidth limit: Use available
Data transfer configuration as follows.
From Specific files and folders, set Add Pattern to copy files beginning with a specific folder and specific file name.
/stl1/10* /stl2/10* /stl3/10* /stl4/10*
- Copy object tags: OFF
In Logging, click Autogenerate
to create a CloudWatch resource policy that allows CloudWatch log groups and DataSync to write to CloudWatch.
Check the contents and create a task with Create task
.
Execute the DataSync task
When the task status becomes 'Available,' click on 'Start,' and then click on the 'Start with defaults' option.
Once the task has been executed, we can check its progress in History
.
We can see that the data throughput was approximately 202 MB/second. Additionally, the file transfer took about 6 minutes, and it was copied at a rate of 209 files/second.
To check if it has been transferred to the S3 bucket
We found that the data was transferred as configured.
Conclusion
The Builders Session is a 60-minute workshop where you can easily experience AWS services. So, when I attend re:Invent, I always choose services that I don't usually work with or ones I want to catch up on. The DataSync session was many repeat sessions, and it seemed like there was a high interest from people who wanted to learn about migration services for implementing migrations. Additionally, using AWS DataSync allowed us to experience data transfer in just a few steps.